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Abstract 

The problem of optimal allocation of monitoring resources for tracking transactions progressing through a distributed 
system, modeled as a queueing network, is considered. Two forms of monitoring information are considered, viz., 
locally unique transaction identifiers, and arrival and departure timestamps of transactions at each processing queue. 
The timestamps are assumed available at all the queues but in the absence of identifiers, only enable imprecise 
tracking since parallel processing can result in out-of-order departures. On the other hand, identifiers enable precise 
tracking but are not available without proper instrumentation. Given an instrumentation budget, only a subset of 
queues can be selected for production of identifiers, while the remaining queues have to resort to imprecise tracking 
using timestamps. The goal is then to optimally allocate the instrumentation budget to maximize the overall tracking 
accuracy. The challenge is that the optimal allocation strategy depends on accuracies of timestamp-based tracking at 
different queues, which has complex dependencies on the arrival and service processes, and the queueing discipline. 
We propose two simple heuristics for allocation by predicting the order of timestamp-based tracking accuracies 
of different queues. We derive sufficient conditions for these heuristics to achieve optimality through the notion 
of stochastic comparison of queues. Simulations show that our heuristics are close to optimality, even when the 
parameters deviate from these conditions. 

Keywords: Probabilistic transaction monitoring, Queueing networks, Stochastic comparison, Bipartite 
matching 



1. Introduction 

Transaction processing has been at the heart of information technology since the 1950s when the first 
large online reservation system went into operation 0, 13] ■ Today transaction processing is at the core of 
enterprise IT systems operated by telecommunication service providers, financial institutions and virtual 
retailers. The scope of transaction processing has widened to incorporate multiple software components and 
applications, servers, middleware, backend databases, and multiple information sources j^j. 

The growing complexities of transaction processing presents new challenges to system management and 
support. Today's support helpdesks are no longer knowledgeable with the intimate details of transaction 
processing. The presence of heterogeneous components, legacy systems and third-party "black box" compo- 
nents [B} makes debugging, a slow and an expensive ordeal. It is thus highly desirable to speed up debugging 
through automated monitoring solutions. 

Although tools may be available for independent trouble-shooting within each of the components, they 
cannot capture the entire life-cycle of a transaction, and thus cannot support diagnosis at the transaction 
level. Instead, an integrated end-to-end solution which tracks the entire path of transaction processing 
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Figure 1: Introducing identifiers to timcstamps at queue Qo through instrumentation precisely tracks transactions progressing through 
it. On the other hand, non-instrumented queues Qi and Q2 have to track transactions using only arrival and departure timcstamps 
may incur errors due to uncertainty in the order of departures. 



is required [6|. An end-to-end monitor collects transaction records from different components and then 
correlates or matches them to obtain the complete transaction path. If all the components are instrumented 
properly, e.g., using techniques in 0, HI, @], then each transaction record at every component is tagged with 
a unique identifier corresponding to the transaction generating it. Using these identifiers, correlation of 
transaction records at different components can then be done precisely. 

In many practical scenarios, however, complete instrumentation of all the components is rarely the 
norm. This is due to the presence of legacy systems and third-party components with monitors producing 
incompatible transaction records, which in effect, is a set of "black boxes". In the extreme case when 
none of the components is instrumented, monitoring solutions have to fall back on other generic features 
in the records such as timestamps to statistically "guess" the set of records likely generated by the same 
transaction, and thereby infer the path taken by that transaction 0, Q , with the caveat that the results 
may be erroneous. 

Most real systems lie somewhere in the middle of the spectrum between the extreme scenarios of fully 
instrumented and fully non-instrumented systems. In fact, most system integration and instrumentation is 
a gradual process which starts from an ensemble of black boxes and slowly transitions to a system of "clear" 
or "open" boxes as the support staff acquaint themselves with various components. Given sufficient time 
and efforts, skilled programmers are able to retrofit instrumentation^ to components by injecting monitoring 
code or building an extra layer of middleware @. A complete instrumentation, however, can incur daunting 
costs and is nevertheless wasteful in components where statistical tracking using timestamps already has 
good accuracy. Our goal is then to systematically characterize the performance of partially instrumented 
monitoring systems and identify components where retrofitting instrumentation is most required. 

We answer the following questions: given a limited budget for instrumentation, what is the optimal 
allocation strategy to maximize overall accuracy of tracking transactions? What is the influence of various 
system parameters, such as the queueing arrival and the service rates, on the instrumentation strategy and 
the tracking accuracy? Are there simple easy-to-implement heuristics that also have good performance 
guarantees? What follows is a set of systematic answers to these questions. 

1.1. Technical Approach and Contributions 

We consider the problem of tracking transactions through a distributed system with limited instrumen- 
tation support. Our goal is to select an optimal subset of components for instrumentation under a budget 
constraint such that when combined with statistical tracking (using timestamps) at the non-instrumented 
components, the overall tracking accuracy is maximized. 

Our contributions are three fold. First, we analyze the accuracy of statistical tracking using timestamps 
at a queue and characterize its dependency on different queueing parameters. Second, using these insights, 
we propose two simple heuristics for the instrumentation allocation problem. Third, we derive sufficient 



Note that with partial instrumentation here the identifiers are local, defined only within each queue, which is different from the global 
identifiers in fully instrumented systems [^, 
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conditions for these heuristics to achieve optimality, based on the arrival and the service distributions at the 
queues. 

Model: We model the progress of the transactions in a distributed system as a queueing network, where 
each queue represents a system component. By default, we assume the availability of (an ordered) set of 
arrival and departure timestamps at each queue while identifiers are only available upon instrumentation 
(queue Qq in Figfl]). Due to parallel processing of transactions, e.g., in infinite server or processor-sharing 
queued, the order of departures is not unique, and in the absence of identifiers, tracking transactions 
through a queue requires statistical matching techniques. We analyze tracking accuracies using timestamps 
under two simple statistical matching policies. Identifiers are available only upon instrumentation and by 
instrumenting a queue, we mean injecting code or building a middleware wrapper which tags each timestamp 
with an identifier unique to the transaction, leading to error-free tracking at those queues. 

Formulation: Based on the above model, we formulate a resource allocation problem, where we opti- 
mally allocate the available amount of monitoring resources by selecting queues for instrumentation such 
that the overall tracking accuracy is maximized. The optimal allocation strategy thus selects queues for 
instrumentation in the increasing order of their timestamp-based tracking accuracies, until the budget con- 
straints are met. However, the exact expression of tracking accuracy at each non-instrumented queue is not 
tractable to compute in general, and has complex dependencies on the arrival and service statistics, and also 
on the queueing discipline. 

Heuristic Solutions: To overcome this obstacle, we propose two simple heuristics for instrumentation 
allocation which predict the order of the timestamp-based tracking accuracies at different queues without 
computing the exact expressions. The first heuristic predicts that the order of tracking accuracies is in the 
reverse order of their queueing load factors. The second heuristic predicts the order of accuracies using an 
approximation for the tracking accuracy, which becomes tight in the light load regime. The two heuristics 
represent different tradeoffs in that the load-factor heuristic requires only the knowledge of the queueing load 
factors while the approximation-based heuristic requires the full knowledge of arrival and service processes 
but is a more efficient allocation strategy (demonstrated through both theory and simulations). 

Optimality conditions: We provide sufficient conditions for these heuristics to achieve optimality, i.e., 
to correctly rank the order of the tracking accuracies, based on the notions of stochastic and convex orders 
of the arrival and service distributions of the queues. The conditions have intuitive explanations in terms 
of the rate and the "variability" of arrivals and services. In particular, these heuristics are always optimal 
when all the arrival distributions and all the service distributions belong to the same family. Simulations 
verify the optimality of our heuristics under the derived conditions and also show that our heuristics are 
close to optimality even when the parameters deviate from these conditions. 

Alternative Formulation: Besides allocating instrumentation resources, our heuristics are also appli- 
cable in other scenarios of monitoring. For instance, for a large system, the overhead in collecting timestamp 
records from all the components may be too large. In this case, the optimal monitoring resource allocation is 
to a priori select only a subset of components (queues) with the highest timestamp-based tracking accuracies 
for data collection. Our heuristics and their optimality guarantees are directly applicable here. 

Non-goals: We emphasize some of our "non- goals". Our formulation and solutions have a strong 
theoretical foundation and are meant to provide guidelines for efficient instrumentation or data collection 
in different scenarios. We do not attempt to replace existing instrumentation-based monitoring tools (See 
Section. ri.2l for a discussion) and exploit them when available. Our belief is that existing monitoring solutions 
will have broader application by allowing for partial instrumentation, and we have a systematic approach for 
pursuing it. Moreover, our solutions are not meant to automatically diagnose or correct faults, characterize 
overall system performance, or provide real-time analysis, although such exercises can be carried out after 
monitoring the transaction paths. 



There is no uncertainty in the order of departures for single-server queues with fixed order processing. Hence, their timestamp-based 
tracking is error-free, and we do not consider them for allocation. 
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1.2. Related Work 

The early literature on monitoring distributed systems relies on deep understanding of internal system 
structures so that instrumentation code can be injected into proper places to record system activities at 
process or object levels [HI El [HI ■ These solutions become difficult to implement in modern systems where 
components are typically developed independently. Most existing monitoring solutions rely on certain types 



of instrumentation that can expose the activities of interest [13|, |14j, |15|, |6j. There are also a number of 
commercially-available products for monitoring and trouble shooting in distributed systems 0, [H, fli| , which 
are again based on instrumenting the system software. 

While instrumentation provides reliable monitoring information, it has limited use in heterogeneous 
systems where many components are from third-party vendors or legacy systems. One approach is to make 
the instrumentation as component- independent as possible, e.g., by limiting changes to system code rather 
than user-space code fl7j j . Another approach is to treat each component as a black box and only rely on 
external activities of these black boxes for monitoring [f| [|| . These existing black-box based solutions 
can be divided into two approaches: identifier-based approach Q which tags each incoming transaction 
with a unique identifier that is associated with it throughout the system, converting the problem to the 
instrumented case, and trace-based approach [B], [l6[ which uses statistical techniques to extract monitoring 
information from non-tagged activities. For example, 

SGI 

use messages between components to infer 
causal paths and bottlenecks. We share a similar view as [g, |l6| in that a monitoring solution should be as 
non-intrusive and agnostic as possible to allow for broad application, especially in systems involving black 
boxes, but there are two key differences that distinguish our work from this literature: (i) we are interested 
in monitoring individual transactions rather than aggregate system behaviors such as causal paths and 
bottlenecks, and (ii) we take a hybrid approach of using both passive monitoring (via timestamp-based 
tracking) and instrumentation (that introduces identifiers), but treat the latter as a limited resource to be 
allocated judiciously. 

In , tracking of individual transactions in a distributed system based solely on timestamps is considered. 
However, Q focuses on developing optimal matching policies for timestamp-based transaction monitoring, 
whereas we focus on the comparison of tracking accuracies at different subsystems while leveraging statistical 
matching policies discussed in Q for tracking in the non-instrumented states. The stochastic comparison 
techniques used in this paper has a rich history and has been applied compare different queueing parameters 
such as delay and throughput [lit Ch. 14]. To the best of our knowledge, comparison of monitoring 
accuracies at different queues has not been considered before. 

Organization: The paper is organized as follows. In Section^ we describe the system model and problem 
formulation. In Section[3l we analyze the policies for matching timestamps. In Section|4] we propose the two 
heuristics for monitoring resource allocation. In Section [SJ we introduce the notion of stochastic comparison. 
In Section [51 we derive sufficient conditions for the optimality of the two heuristics for network of infinite- 
server queues. Section [7J deals with extensions to general product-form queues. In Section [51 we evaluate 
the efficiency of heuristics through simulations. Section [HI concludes our paper. 



2. System Model and Formulation 

We now describe the queueing model in detail and then formulate the problem of optimal monitoring 
resource allocation. Before we proceed, here are a few comments regarding the notation used in this paper. 
Vectors are represented by boldface, e.g., X and X(i) is its i th element. Let fx{x), Fx(x) and Fx{x) 
denote the probability density function (pdf), cumulative distribution function (cdf) and complementary 
cumulative distribution function (cedf) of a continuous variable X . Let K[X] denote its expectation and let 
supp(/x) denote the support of fx- 

2.1. System Model 

We consider a queueing network, and initially limit to the case where all the queues are infinite server 
(GI/GI/oo). The arrival and service times are drawn i.i.d. from general continuous pdfs fx and fx- In 
Section [71 we generalize some of our results to the product-form queues. We assume that the sequence 
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Table 1: Symbol list. Subscript k means queue Qj.. 

of queues visited by each transaction is a Markov chain, and the service is independent of the transition 
sequence. The list of notations for different queueing parameters is given in Table [TJ The propagation 
delays and synchronization errors between different queues are assumed independent of the service or arrival 
realizations. 

Given a set of ordered arrival and departure timestamps, and T> k at queue Q k , there is a relationship 
between the service times and the true matching Tt\ between the arrivals and the departures, as 

T k {i) = D k {-K{(i))-Y k {i), zeN. (1) 

Hence, TT k (i) is the rank of a departure timestamp corresponding to the i th arrival to the queue Qfc- Since we 
have access to only the arrival and departure timestamps and D^, and not to the actual service times 
T k , the true matching irj, is unknown. A bipartite matching policy 7 comes up with a probable matching ir 1 
between the arrival timestamps and the departure timestamps D^, which yields correct matchings with 
a certain degree of accuracy, and is discussed in detail in Section [3] In addition, we assume that identical 
policies 7 are employed for matching at all the queues to facilitate comparison of their tracking accuracies. 

Our analysis will be on a typical busy period, i.e., a period of time, starting from an empty queue until 
the next time the queue becomes free, as shown in FigJ5] Let P 1 (k) be the probability that the policy 
outputs a correct matchings between all the arrivals and departures in a typical busy period at queue Q k . 
We use P 7 (A:) as the measure of timestamp-based tracking accuracy, given by 

00 

P~<{k) =Y J P i 7V ~' = Tr\B k = b}. (2) 

6=1 

2.2. Problem Formulation 

We are now ready to state the problem of optimal monitoring resource allocation. Given a budget 
constraint of instrumenting at most E number of queues to enable precise tracking through the production 
of identifiers, our goal is to select E number of queues in Q such that the overall tracking accuracy is 
maximized. For each queue Q k , let z k S {0, 1} be the indicator if it is selected for instrumentation. Then, 
the effective tracking accuracy at queue Q k after instrumentation decisions is 

z k + {l-z k )P^{k), 
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(a) Busy Period S = 1 (b) Busy Period B = 2 

Figure 2: Random arrivals and departures lead to random busy period sizes. 



since the tracking accuracy is unity when identifiers are available and P 7 (/c) is the accuracy based on using 
only timestamps under a statistical matching policy 7. Formally, the optimization is 

z*(E; Q) := argmax V {z k + (1 - z k )P^(k)}, (3) 

z * — » 

Q fc eQ 

si. ^z fc <£, z fe G {0, 1}, z := {z fc : Q k G Q}. 
Q k £Q 

We can see that the optimal allocation strategy is to select E number of queues with the lowest timestamp- 
based tracking accuracies P 1 . The challenge, as we will see, is in finding the tracking accuracy P 1 since it 
has complex dependencies on the arrival and service processes. 



3. Timestamp-based Tracking 

In this section, we describe the matching policies 7 employed for associating the arrival and the departure 
timestamps at a queue, and perform some preliminary analysis on the tracking accuracy of a policy. 

3.1. Bipartite Matching Policies 

We now briefly describe two matching policies 7 that can be employed to match timestamps in the 
absence of identifiers, viz., the first-in first-out (FIFO) rule and the random matching rule. The relative 
performance of these policies depends on the arrival and service statistics. These policies are non-parametric, 
in the sense that they require minimal knowledge about the service statistics for implementation. 

Perhaps the simplest matching rule between the arrival and departure timestamps is the FIFO rule, 
which is an in-order matching rule, i.e., for a given busy-period size B = b, we have a fixed rule 7r FIFO = I, 
where I := [1, 2, . . ] T is the identity vector. The FIFO matching rule is fully distribution- free: it does not 
require the knowledge of arrival or service distribution and is always valid. By valid, we mean that the FIFO 
match has a strict positive likelihood of being the true match between the arrivals and the departures. An 
expression for the expected matching accuracy under FIFO rule can be found in | Appendix A| 

In addition to the FIFO matching rule, we consider another simple rule called random matching, where 
given a realization of arrivals and departures in a busy period, we uniformly pick a valid matching among all 
possible matchings. The random matching rule is almost distribution-free: it only requires the knowledge 
of supp(/T fc ), the support of the service pdf, in order to ensure the validity of different matchings. This is 
because a valid matching it at queue Q k in a busy period of size B k — b satisfies 

b 

n:Y[f Tk lDk(n(i))-Y k (i)}>0, (4) 

i=i 

and the above expression only requires the knowledge of the support bounds. An expression for tracking 
accuracy p RAND under random matching is given in | Appendix B 
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In contrast to the non-parametric FIFO and random matching rules, the parametric maximum-likelihood 
matching rule [s| requires the full knowledge of the service distribution. The maximum-likelihood rule is 
defined as the rule which maximizes the probability of correctly matching all the arrivals and departures. 
However, it is not tractable to analyze this rule since it is fully adaptive to the realization of arrivals and 
departures, and depends on the arrival and service statistics in a complex manner. In many cases, the simple 
FIFO and random matching policies coincide with the maximum-likelihood rule or have close to optimal 
performance, as discussed below. 

The effectiveness of using the FIFO or the random policy crucially depends on the nature the service 
distribution (for a given realization of arrivals). For instance, under light-tailed services, the probability of 
out-of-order departures is small and hence, the FIFO rule is expected to have good tracking accuracy. In 
fact, for WeibulU family of distributions, with shape parameter greater than one (and hence, light tailed), 
FIFO is the optimal matching policy coinciding with the maximum-likelihood rule. More generally, FIFO 
rule is optimal whenever the service pdf is log-concave 0] . 

For heavy-tailed distributions, on the other hand, the chances of out-of-order departures are high, and 
the FIFO rule is not close to the maximum-likelihood rule. In this case, the random matching rule may 
have better tracking accuracy than the FIFO rule. This is observed in our simulations in Fig|5blfor Weibull 
distribution with shape parameter smaller than one. Moreover, random matching is optimal in case of batch 
arrivals to the infinite-server queue where all possible matchings between the arrivals and departures are 
equally likely, although we do not study this scenario in the paper. Hence, the relative performance of FIFO 
and random matching rule depends on the service distribution. 

3.2. Tracking Accuracy 

Recall we consider the probability of matching all timestamps in a typical busy period to be the measure 
of tracking accuracy. Perhaps, a more straightforward measure of accuracy is the probability of correctly 
matching only a typical pair of arrival and departure timestamps. This however depends on the probability 
of correctly matching other arrivals and departures. On the other hand, the matching across busy periods 
is independent, since a valid matching between arrival and departure timestamps occurs only within busy 
periods not across them. See Fig|21 Hence, the probability of correct matching in a typical busy period P 7 
is the relevant measure for tracking accuracy. 

The challenge is in computing P 7 in Consider FIFO matching as an example. Its accuracy is equal 
to (see |Appendix A[ ) 

oo 6—1 

pF i P a £> ( p| {T{l) g [X (i),X(i) + T(i + 1)]} n {T(b) <X(b)}), 

6=1 i=l 

where the events T(i) £ [X(i),X(i) + T(i + 1)] and T(b) < X(b) cannot be evaluated separately since are 
correlated with one another other. We can see that the expression becomes intractable as we increase b, the 
size of the busy period. 

More generally, a matching policy 7 may select any one of the valid matchings or permutations with a 
certain probability, and the tracking accuracy P 7 from @ becomes 

00 

P 7 = ^^P[7 r 7 = 7 r*- 7 r„P = 6], 

6=1 7Tj 

where the sum is over all the permutation vectors 7Tj over {1,2, ...,&}. Since there are b\ number of 
permutation vectors, we require exponential number of computations in b. 

It is therefore not tractable to compute the tracking accuracies P 7 (fc) at different queues Qk, in order to 
find the optimal resource allocation strategy in (J3J. Moreover, it is useful to obtain some general guidelines 



The pdf of a Weibull variable is f(x) — (ttXtt)™ 1 cxp(— {^) w ) for x > 0, where w and c are shape and scale parameters. When 
w > 1, the distribution is light tailed, when w < 1, it is heavy tailed and w = 1 is the exponential distribution. 
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Figure 3: Matching arrival and departure timestamps decomposes across different busy periods. 

about the influence of different queueing parameters on the resulting tracking accuracy. Fortunately, we 
note that we do not need to know the exact accuracies at different queues in the network to obtain the 
optimal solution to instrumentation allocation in (jSJ). In fact, it suffices to know the relative order of these 
accuracies. The goal of this paper is to establish simple heuristics that can be used to infer the order of 
matching accuracies without directly computing them. To this end, we now propose two approaches with 
different complexities and generality. Later in Section [5] and [5J we derive sufficient conditions for these 
heuristics to achieve optimality according to ([3]). 

4. Two Heuristics for Optimal Resource Allocation 

We propose two approaches to instrumentation allocation through prediction of the order of timestamp- 
based tracking accuracies P 1 (k) at different queues Qk.. One approach is to avoid computation of P 7 (fc) 
altogether and instead infer their order through simple queueing parameters such as the load factors. The 
other approach is to approximately compute P 7 (fc) by only considering small busy-period sizes. Both these 
simple approaches instrument queues independent of the policies 7 employed for timestamp matching. We 
now describe these two approaches in detail. 

4-1. Approach 1: Order of Load Factors 

The load factor pk = of a queue Qk, which is the ratio of the arrival rate A& to the service rate 
/ifc, is perhaps the most commonly used queueing parameter for performance evaluation of queues. We 
propose the load-factor heuristic for instrumentation allocation which selects queues for instrumentation 
in the decreasing order of their load factors until the budget constraint is met. The load-factor heuristic 
is robust since the selected set of queues is invariant under small perturbations in the arrival and service 
statistics. 

The load-factor heuristic predicts queues with higher load factors to have lower timestamp-based tracking 
accuracies. This is intuitive since a lighter load implies a smaller number of simultaneously-served arrivals in 
the infinite-server queue on average leading to a lower uncertainty in the order of departures. The intuition, 
however, does not extend when we consider queues with different arrival and service distributions. The 
arrival and service processes influence the tracking accuracy in a complex manner, and the load factor may 
not always capture the required effects for comparison of tracking accuracies at different queues. 

A simple example is two queues with same arrival rate, one with uniform service Unif (0, 2m) on support 
[0, 2m] and the other with deterministic service of value rrid > m. Here, the load-factor heuristic incorrectly 
predicts the deterministic service to have worse tracking accuracy, while, in fact, it actually has perfect 
accuracy. Hence, the load-factor heuristic is not universally optimal for instrumentation allocation. 

An intuitive reason for the sub-optimality of the load-factor heuristic is that there are two sources of 
errors impacting the tracking accuracy: variability in service times leading to uncertainty in the order of 
departures and high load factor resulting in more simultaneous servicing in infinite-server queues on average. 
The load-factor heuristic only captures the latter effect and completely ignores the former. As we saw in 
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the above example, simultaneous servicing does not always lead to bad accuracy and is also governed by the 
variability in the service times. 

In many cases, different subsystems in a distributed system may have similar service distributions (such 
as from the same family), but with different load factors. Here, the load-factor heuristic may correctly 
predict the order of the tracking accuracies. We prove a sufficient set of conditions for the optimality of the 
load-factor heuristic in Section [5] by precisely investigating the dependency of the arrival and the service 
processes on the tracking accuracy. 

4-2. Approach 2: Small- Batch Approximation 

The load-factor heuristic described in the previous section avoids computation of the tracking accuracy 
altogether. We now propose an alternative heuristic which approximates tracking accuracy through a simple 
expression, and makes instrumentation decisions based on the approximation. We later demonstrate the 
superiority of this heuristic over the load-factor heuristic, both through theory and simulations. 

The approximation for tracking accuracy is based on the series expansion 

oo 

r<(k) = P[B k = 1] + P?(k)F[B k = b], (5) 
b=i 

where P 7 = 1 since when there is only one transaction in the busy period, tracking is perfect. Under 
sufficient variability of the service times (i.e., not deterministic services), the probability of correct matching 
typically decays with the busy-period size, 

lim P 7 (fc) = 0, 

since the number of possible matchings grows exponentially with the busy-period size b and we make an error 
almost surely as the busy period size goes to infinity. Hence, the terms corresponding to larger busy-period 
sizes in ([5]) can be dropped and an approximate tracking accuracy can be efficiently computed by limiting 
to small busy-period sizes. 

The simplest approximation is when we ignore all the terms in ([5]) except for the first one, which is simple 
to evaluate. We refer to this as the unit-batch approximation and use it to allocate instrumentation resources 
to queues. Note that the unit-batch approximation is slightly more complex than the load-factor heuristic. 
We demonstrate, both through theory and simulations, that this leads to superior performance over the 
load-factor heuristic; the intuition being that this heuristic captures additional features of the arrival and 
service statistics. 

At low arrival rate, this approximation (and also more refined ones with more terms) becomes tight in 
the limit. Intuitively, at low arrival rates, the dominant event is having a single arrival in each busy period 
since the arrivals are widely separated on average. 

Proposition 1. (Tightness at Low Arrival Rate). As the arrival rate to a queue Q k goes to zero, 
and the service distribution is kept fixed, we have 



lim =1 (6) 

Proof: As X k -> 0, we have P[B k = 1] = V[X k > T k ] ->• 1 and P 7 ->■ 1 since the probability of out-of-order 
departures goes to zero. □ 
Hence, the tracking accuracy P 7 is well approximated by the probability of unit busy period in the 
low arrival rate or the light load regime. However, simulations in Section [H] show that the unit-batch 
approximation correctly captures the trend of P 7 and is hence, an efficient strategy for instrumentation 
allocation over a wider regime of loads. 
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5. Preliminaries: Stochastic Comparison 

We have so far proposed two simple heuristics for optimal instrumentation resource allocation which 
circumvent the challenges in computing the tracking accuracies at various queues. Our goal is to establish 
a general set of conditions on the arrival and service processes, under which these simple heuristics coincide 
with the optimal allocation strategy. To this end, we introduce the notion of stochastic comparison of 
random variables. 

Perhaps the simplest notion of comparing two random variables is through their mean values. But very 
often, this comparison turns out to be too loose to draw useful conclusions since the probability distribution of 
the two variables can be very different. In the context of this paper, comparing only queueing load factors, 
which is just the average system behavior, is not enough to always guarantee an order of the tracking 
accuracies of the queues and hence, optimality of the load-factor heuristic for instrumentation allocation. 

Instead, we impose stronger constraints on the distributions of the variables under comparison to obtain 
useful conclusions. Here, we employ two notions of stochastic comparison, viz., the stochastic order and 
the convex order. The stochastic order is a stronger form of comparing the mean values, while the convex 
order is a stronger form of comparing the variances of random variables. The detailed definitions are given 
in | Appendix C| We use these notions in Section [5] to compare tracking accuracies at different queues, and 
to derive sufficient conditions for the optimality of the two proposed heuristics for resource allocation. 

5.1. Stochastic Comparison of Busy Periods 

We now provide some preliminary results on comparing the busy-period sizes of queues under stochastic 
or convex orders of arrival and service processes. We use these results in Section [6] to obtain an order on the 
tracking accuracies of the queues thereby establishing the optimality of our heuristics for instrumentation 
allocation. 

We now show that under a stochastic order of arrival processes and service processes at two queues, we 
can guarantee a stochastic order of the size of their busy periods. 

Lemma 1. (Comparison of Busy Periods under Stochastic Order). For two GI/GI/oo queues 
QkiQm with i.i.d arrivals Xk,X m and i.i.d service times Tk,T m , we have 

st st st 

Xk < X m ,Tk > T m => Bk > B m . (7) 

Proof: See | Appendix D| □ 
The above result confirms our intuition that the size of the busy period increases with faster arrivals and 

slower services (and hence, higher load factors), formalized under the notion of stochastic order. 

We now consider an alternative scenario where one queue has a higher (normalized) service variability 

than the other, formalized by the presence of a convex order. We show that this also implies a stochastic 

order on their busy-period sizes for the special case of Poisson arrivals at all the queues. 

Lemma 2. (Comparison of Busy Periods under Convex Order & Poisson Arrivals). For two 
M/GI/oo queues Qk,Qm with i.i.d Poisson arrivals with rates Xk, A m and i.i.d service times Tk,T m , we 
have 

st 

AfeTfe < X m T m Bk < B m . (8) 

cx 

Proof: See |Appendix E| □ 
Informally, the above result states that a more variable service distribution (normalized by the arrival 
rate) results in larger busy periods. 

The results in ([7]) and ([5]) form an integral component of our proofs in the comparison of tracking 
accuracies since, larger busy periods leads to lower tracking accuracies. However, we see in the subsequent 
sections that certain additional conditions, in addition to stochastic or convex orders of arrivals and services, 
are needed to guarantee the order of the tracking accuracies, and hence, optimality of our heuristics for 
instrumentation allocation. 
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6. Optimality in GI / GI / oo Queues 



6.1. Load-Factor Heuristic 

Recall that the load-factor heuristic, described in Section B~Tl predicts queues with higher load factors to 
have lower timestamp-based tracking accuracy and hence, selects them for introducing identifiers through 
instrumentation. We now provide sufficient conditions on the arrival and service processes under which the 
load-factor heuristic is the optimal resource allocation strategy. 

A stochastic order on the arrival and the service times is a prerequisite condition in our approach since 
it leads to a stochastic order on the busy periods from Lemma [1] In addition to the stochastic order on the 
arrival and service processes, we need additional conditions to establish the order of the tracking accuracies, 
depending on the matching policy employed. These additional conditions turn out to be different for the 
FIFO and the random matching rule. This is because the tracking accuracies of the two rules are sensitive 
to different kind of events. For the FIFO rule, any out-of-order departure results in an error, which implies 
its sensitivity to the spread of the service distribution, defined precisely in Section [6 .1.1 1 On the other hand, 
random matching is somewhat less sensitive to the service spread since it uniformly picks a matching out of 
all valid matchings, and this is reflected in our results. We first provide sufficient conditions for optimality 
of the load-factor heuristic under the FIFO rule and then consider the random matching rule. Finally, in 
Section [6. 1.31 we provide examples where these conditions are satisfied. 

6.1.1. Optimality Under FIFO Matching Rule 

We now provide conditions for the optimality of the load-factor heuristic when FIFO is the matching 
policy employed at all the queues. Since overtaking or out-of-order departures cause errors in FIFO matching, 
we relate the tendency for overtaking to the spread of the service distribution, given by 

V k :=T k (l)-T k (2), (9) 

where T k (l) and 71.(2) are independent samples of the service time T k at queue Q k - Note that V k = 0, if 
the service is deterministic. The spread of a distribution is thus related to the variability; a more "spread 
out" service distribution has higher variability, and thus, has higher tendency for generating out-of-order 
departures. 

We now show the main result that the order of the tracking accuracies under FIFO rule follow the reverse 
order of the load factors in the presence of a stochastic order. 

Theorem 1. (Optimality of Load-Factor Heuristic Under FIFO Rule). At queues Q k , Q m , under 
a stochastic order on arrival times X k and X m , service times T k and T m and their spreads V k and V m , we 
have 

st st st 

X k < X m , T k > T rn , \V k \ > \V m \ 

Pk > Pm , P F,FO (k) < P"' F °(m). (10) 

Hence, if the arrival, service and service spread distributions at all the queues satisfy the above stochastic 
order, then the load-factor heuristic for allocation of instrumentation resources is optimal, according to 
optimization in ([3]). 

Proof: See |Appendix F| □ 
Hence, slower arrivals, faster services (which thus imply a lower load factor), and lower service spreads 
result in more accurate tracking under the FIFO rule, when the comparison is formalized by the notion of 
stochastic order. 

The combined conditions of service speed and spread in (|10[) places constraints on the service distributions 
under comparison. Informally, we need one service to be simultaneously slower and more spread out than 
the other, i.e., one service distribution has more probability mass concentrated closer to zero than the other. 
For example, the Weibull distribution with different shape parameters but same scale parameter satisfies 
this condition, as shown in Figf4] 
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Figure 4: Comparison of two Weibull distributions with pdf f(x) = (2£)(2J)' W 1 exp(— (^) w ) for x > 0. The distribution with 
lower shape parameter w has higher FIFO tracking accuracy. Sec Theorem^ 



6.1.2. Optimality Under Random Matching 

We now provide sufficient conditions for optimality of the load-factor heuristic when the random matching 
rule is employed for matching arrival and departure timestamps at all the queues. Recall that random 
matching rule uniformly chooses a matching among all valid matchings in the busy period. 

We now show the main result of this section that the order of the tracking accuracies under the random 
matching rule follow the reverse order of the load factors in the presence of a stochastic order. 

Theorem 2. (Optimality of Load-Factor Heuristic Under Random Matching Rule). At queues 
Qki Qm, under random matching rule with arrival times and X m , service times and T m with supports, 
supp(/ T J = [otk,Pk] and swpp(f Tm ) = [a m ,fi m ], we have 

st st 

Xk < X m , Tk > T m , at < a m 

^Pk> Pm ,P RAND (k)<P RAND (m). (11) 

Hence, if the arrival, service and service support at all the queues satisfy the above stochastic order, then 
the load-factor heuristic for allocation of instrumentation resources is optimal, according to optimization in 



Proof: See |Appendix G □ 
Hence, slower arrivals and faster services along with a mild condition on the support lower bounds of the 

service distribution imply lower tracking accuracy under the random matching rule, when the comparison 

is formalized by a stochastic order. 

The condition in (jllj) on the support of the service distributions is mild and is usually satisfied since one 

mostly encounters service distributions with a lower bound of support equal to zero. However, it cannot 

be dropped as seen in this example when Tk = fik and T m = Unif (0, 2fj, m ), the uniform distribution, with 

Pk > [i-m- Since ctk = Pk > ce m = 0, ([TTj) does not hold, which is indeed true since in fact, P RAND (k) = 1 > 

P RAND (m) in this example. 

6.1.3. Special Case: Same Distribution Family 

We have so far established sufficient conditions for optimality of the load-factor heuristic when all the 
queues employ either the FIFO or the random matching rules. We now consider a special case of arrival and 
service distributions belonging to the same distribution family where optimality of the load-factor heuristic 
is guaranteed under both FIFO or random matching rules, without the need for additional conditions. 

Corollary 1. (Optimality of Load-factor Heuristic Under Same Distribution Family). When 
the service distributions at different queues are linearly scaled versions of the same distribution, and the 
same holds for all the arrival distributions as well, then the tracking accuracies at the queues are in the 
reverse order of their load factors under both FIFO and random matching rules. Hence, here, the load factor 
heuristic is optimal for resource allocation, according to optimization in ([3]). 
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Proof: We show that the conditions for FIFO rule in Theorem [T] are satisfied. For random matching rule, 
the condition on lower bound of support in Theorem [5] is, however, violated. Hence, we need to prove the 
above statement from scratch. See |Appendix H| □ 
The above result holds if all the service distributions are say exponential, uniform and so on. In practice, 
the service distributions of different subsystems may be similar and hence, this result may be relevant. 
The constraint on the arrival processes is however more restrictive in case of an inter-connected network of 
queues, since it limits to Poisson arrivals to the system. 

6.2. Unit-Batch Approximation 

We have so far demonstrated the effectiveness of the load-factor heuristic when the arrival and service 
distributions are similar or more generally, constrained to satisfy a stochastic order. Next, we provide 
sufficient conditions to establish the optimality of the alternative heuristic for instrumentation allocation 
based on unit-batch approximations, described in Section 14.21 Recall that the unit-batch approximation 
selects queues for instrumentation in the increasing order of their probability of having a unit-sized busy 
period. 

6.2.1. Optimality Under Stochastic Order 

We now show that the conditions given in Theorems [1] and [2J which guarantee optimality of the load- 
factor heuristic, also guarantee the optimality of the unit-batch approximation. 

Theorem 3. (Optimality of Unit-Batch Approximation Under Stochastic Order). We have 
for two queues Q k and Q m , 

X k < X m , T k > T m , \V k \ > \V m \ 

P[B k = 1] < F[B m = 1], P F,FO (k) < P F,F "(m). (12) 



st st 

Xk < X m , T k > T m , a k < a m 

F[B k = 1] < F[B m = 1], P RAND (k) < P nAND {m). (13) 

Hence, the above conditions guarantee that the heuristic based on unit-batch approximation coincides with 
the load-factor heuristic and hence, also achieves optimality in ([3]). 

st 

Proof: It is easy to see that ¥[B k = 1] = P[X k > T k ] < P[B m = 1] since X k - T k < X m -T m . □ 
Hence, the unit-batch approximation achieves optimality in the above scenario where the load-factor 
heuristic is also optimal. We now demonstrate the superiority of the unit-batch approximation over the 
load-factor heuristic by considering a different scenario. 

6.2.2. Optimality Under Convex Order 

We now consider a special scenario where all the queues have the same load factor but with different 
service variabilities. In this case, the load-factor heuristic fails to distinguish the tracking accuracies of 
different queues and its performance is equivalent to a random selection of queues for instrumentation. On 
the other hand, we show below that the unit batch approximation achieves optimality when the queueing 
services satisfy a convex order. 

Theorem 4. (Optimality of Unit-Batch Approx. Under Convex Order and FIFO Rule). For 

two M/GI/oo queues Q k ,Qm with i.i.d Poisson arrivals with rates X k , A m and i.i.d service times T k ,T m , 
we have 

AfeTfe < A m T m 

cx 

V[B k = 1] > F[B m = 1], P F,FO (k) > P F1FO {m). (14) 
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Hence, under Poisson arrivals, convex order of normalized services and FIFO matching rule, the unit-batch 
approximation is the optimal strategy for allocation of instrumentation resources, according to optimization 
in ©. 

Proof: ¥[B k = 1] = E[e~ A * Tfe ] is a concave function in X k T k and hence, V[B k = 1] > F[B m = 1]. For the 
order of P FIFO (fc) and P FIFO (m), see [Appendix I| □ 
Hence, the unit-batch approximation achieves optimality over a wider range of distributions than the 
load factor heuristic. The relative performance of the load-factor heuristic and unit-batch approximation 
for instrumentation allocation depends on the queues under consideration. For queues with similar service 
distributions but significantly different load factors, the load-factor heuristic suffices to achieve efficient 
allocation. On the other hand, if all the load factors are close to one another, the effect of service variability 
and higher-order moments become significant and are not captured by the load- factor heuristic. In such 
scenarios, there is significant advantage in employing the unit-batch approximation. 

7. Product-Form Networks 

We have so far considered comparison of monitoring performance for different service distributions when 
all the queues are infinite-server queues. In this section, we extend some of our results to the more general 
queueing networks consisting of egalitarian processor sharing (PS) queues (with load factors less than one 
to ensure stability) and the infinite-server queues. These are part of the well-known product-form queued 

7.1. Processor-Sharing Network 

We first consider all the queues to be processor-sharing queues which makes comparison between them 
tractable. In the (egalitarian) processor sharing, each waiting transaction gets an equal share of service 
capacity. Since there is simultaneous processing of transactions, out-of-order departures are possible and 
there is uncertainty in matching arrival and departure timestamps. 

In a nutshell, we now show that the comparison results for infinite-server queues under random matching 
in Theorem [2] holds for processor-sharing queues as well. However, the proof is more involved since the 
sojourn time distributions of different transactions are correlated under processor-sharing discipline. 

We use the term job-length to refer to the amount of service required, and we use the term sojourn time 
to denote the amount of time spent in the system. We denote the job-lengths by J = [^(1), ^(2), • • ■] and 
assume that J(i) fj. 

Theorem 5. (Optimality of Heuristics in Processor-Sharing Queues Under Random Match- 
ing). Given two processor-sharing queues with job lengths J k and J m and supports [ct k ,f3 k ] and [a m ,/3 m ], 
we have 

St 

Jk > J m ,oi k < a m , (15) 
=> Pm > Pk, P[B m = 1] < V[Bk = 1], P RAND (k) < P RAND (m). 

Proof: See | Appendix J| □ 
The above results on comparison of two processor-sharing queues under random matching are iden- 
tical to those comparing two infinite-server queues in Theorem [51 Hence, our heuristics are optimal for 
instrumentation under the above stochastic-order conditions when all the queues are either infinite-server 
or processor-sharing queues. However, when we have both infinite-server and processor-sharing queues, the 
above results are no longer valid and we consider this scenario in the next section. 



^The tracking accuracy of GI/M/1 with first-come first-serve (FCFS) or last-come first-serve with preemption (LCFS-PR), which are 
part of a product-form network, is unity. This is because there is a fixed order of departures. Hence, they arc ignored for instrumentation 
allocation. 
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Arrival rate 1 of Poisson process, 1000 transactions, 10 Monte Carlo runs. 




7.2. Product- Form Network 

We now compare monitoring performance of a processor-sharing queue with an infinite-server queue. 
This analysis is more complicated since the sojourn times of the two queues have different dependency 
structures. We limit to the scenario when the job lengths in the processor-sharing queue stochastically 
dominate the service times of the infinite-server queue. 

Theorem 6. (Optimality in Product-Form Networks Under Random Matching) . Given a processor- 
sharing queue with job-lengths J PS with support [a PS , /3 PS ] and infinite-server queue with service T INF , and 
arrivals X PS and X INF , 

st st 

X PS < X INF , J PS > T INF , a PS < a INF (16) 



=*> Pps > p INF , P[B PS = 1] < P[B INF = 1], P% ND < P INF 



Proof: See |Appcndix K| □ 
Hence, in a product-form network, under the above stochastic order, our two heuristics coincide with 
the optimal instrumentation strategy. 



8. Numerical Analysis 

We have so far provided a precise set of theoretical conditions when the two proposed heuristics coincide 
with the optimal instrumentation allocation strategy. In this section, we compare the performance of various 
instrumentation strategies through simulations. There are mainly two questions we seek to answer: How 
do our heuristics compare with the optimal solution when the theoretical conditions in Sections [5] and [7] for 
optimality are not met? What is the relative performance of the two heuristics in different load regimes? 

We consider infinite-server queues with service distributions belonging to the Weibull family. The Weibull 
distribution is a rich family allowing us to tune the rate and the randomness of the service time separately by 
varying the scale and the shape parameters, and also includes the exponential distribution (shape parameter 
w = I). Note that for the same scale parameter c, the variance decreases with the shape parameter w. Hence, 
distributions with w < 1 have higher variance than the exponential distribution, and vice versa. 



15 



Instrument E — 2 out of \Q\ = 10 states, unit arrival rate (A = 1) of Poisson process, service rates (ik 
Unif [0.5, Tmax], Weibull shape parameter Wk * ^ Unif[0.1,2], 1000 configurations. 




Max. Service Rate T m . 
(a) Obj. Value 




(b) Ratio 



Max. Service Rate T max 

Obj. under heuristic 




Max. Service Rate T max 
(c) Fraction of Overlap with Opt. 



Optimal Obj. 



Figure 6: Comparison of instrumentation strategies. Obj. = E + ^2 (1 — z k)P FIFO (k) , see 

Q fc eS 



8.1. Effect of Matching Policies 

In Fig 13 we compare the tracking accuracies P 1 of policies 7 given by the FIFO, random matching and 
the optimal maximum-likelihood (ML) policies. We also compare the unit-batch approximation with the 
exact tracking accuracy. In Fig l5al for the shape parameter w = 1, we have the exponential distribution, 
and all the matching policies, viz., ML, random, and FIFO matchings have equal performance, consistent 
with the analytical results in |9| . In Fig |5b[ for the shape parameter w > 1, FIFO has the same performance 
as ML, and is better than random matching, again consistent with theory in In FigJScl for the shape 
parameter w < 1, we have heavy-tailed services, and here, random matching has better accuracy than FIFO 
rule. This is intuitive since out-of-order departures are more likely under heavy-tailed services. Moreover, 
the tracking accuracy in all these cases increases with the service rate as predicted. 

In all the cases, there is a non-trivial gap between the actual tracking accuracies and the unit-batch 
approximation (up to about 10%); however, the approximation correctly follows the trend of the true 
values. Hence, we can expect solutions based on the exact and approximate evaluation to pick a similar set 
of queues for instrumentation, thereby leading to efficient allocation of monitoring resources, as discussed 
below. 

8.2. Comparison of Instrumentation Strategies 

In Fig[51 we compare our instrumentation strategies based on the load factor and the unit-batch approx- 
imation with the optimal strategy under the optimization rule in ([3]). As a benchmark, wc also compare the 
proposed strategies with random instrumentation, i.e., uniformly selecting a subset of queues for instrumen- 
tation. 

We consider Weibull service times and FIFO matching (similar results are observed under random match- 
ing). We run simulations under randomly chosen parameters for each queue and then average the results of 
different configurations. Specifically, the service rates are drawn i.i.d. uniformly between a minimum and 
a maximum service rate, and so are the shape parameters. We vary the maximum service rate to obtain 
more diverse set of service distributions for the queues under consideration for instrumentation allocation. 
Since the parameters are randomly chosen, the sufficient conditions for optimality of our heuristics proven 
in Section [6] are not met, and we do not expect our heuristics to exactly coincide with the optimal allocation 
strategy. 



5 It is shown in [j| that FIFO matching coincides with the optimal ML tracking when the shape parameter w > 1, i.e, there is 
less variation in service times. 
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In FiglHl we see that the performance of the two heuristics gets closer to that of the optimal strategy as 
the maximum service rate increases leading to a more diverse set of queues. For the load-factor heuristic, 
this is because the load factors of different queues are well separated as the queues become more diverse. 
For the unit-batch heuristic, this is because, in addition, the service rates are increasing on average, leading 
to tighter approximation of the tracking accuracy. On the other hand, the gap between optimal allocation 
and random allocation increases with the maximum service rates since random allocation performs poorly 
when the queues are diverse. We also note that the performance of the unit-batch approximation is superior 
over the load-factor heuristic but they become close when the queues have well-separated load factors, as 
predicted in Section [4] 

9. Conclusion 

In this paper, we considered the problem of optimal instrumentation allocation for tracking transactions 
in a queueing network. Two types of monitoring resources are considered in the form of identifiers and 
timestamps. Identifiers provide precise tracking but are limited while timestamps are imprecise but available 
everywhere. The optimal allocation strategy selects queues with least timestamp-based tracking accuracies 
for introducing identifiers. We proposed two simple heuristics for allocation which coincides with the optimal 
strategy under certain conditions on arrival and service processes. Simulations show that our solutions are 
effective even when there is a deviation from the optimality conditions. 

While providing a strong theoretical foundation and effective solutions for instrumentation allocation, 
we acknowledge that the overall problem has a broader range of challenges. For instance, in practice, the 
model for arrivals and services may not be known and needs to be estimated from data as well. There may 
be systems where complete timestamp information may not be available. We have assumed equal costs 
for instrumenting different components, while with unequal costs, we need to investigate new optimality 
conditions for our heuristics. We have assumed an infinite-server queueing system, while in reality there are 
a finite number of servers. The optimality results can in principle be extended to this scenario. However, 
direct analysis of such a system is much more involved since the service times of different packets are not 
independent. Moreover, the infinite-server system is the worst-case scenario for timestamp-based tracking 
since a finite-server system is less likely to produce out-of-order transactions. In this sense, the recommended 
instrumentation solution can be viewed as maximizing a lower bound on the tracking accuracy under finite- 
server queueing. Other challenges involve analyzing the effect of admission control and allowing for dynamic 
switching of data collection between different systems. 
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Appendix A. Accuracy Under FIFO 

Lemma 3. The tracking accuracy in @ simplifies under FIFO rule as 

oo 

F™ = ^P[7r* = I,B = b], 

6=1 

where each term in the series P[7r' = I, B = b] is given by 

6-1 

= p( n { r « e ™> x « + t (* + !)]> n ™ < 

i=l 

where X(i) and T(i) are the inter-arrival and service times. 
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Proof: Given the busy-period size B = b, the event that FIFO rule is correct is 

6-1 

Ar° = f){m<x( t )+T( l + i)}, 



since i th transaction needs to depart sooner than the (i + l) th transaction. The event that the busy-period 
size is B = b is given by 

6-1 6 

{B = b}= f|{T(i) € [X(i),£*Cj)]} H {T(b) < X(b)}. 

i=l j=i 

pFIFO = ^oo i P[A F b 1FO D{B = b}} and result follows. □ 

Appendix B. Accuracy Under Random Rule 

I n order to compute the tracking accuracy P RAND under random matching rule, we need to find the 
number of valid matchings. The number of such valid matchings is given by the number of perfect matchings 
in the 0-1 biadjacency matrix A^ defined as follows: for a bipartite graph with arrivals in one bipartition 
and departures T>k in the other, the presence of edge in A/, indicates positive likelihood of i ttl arrival 
corresponding to the j th departure 

A k (i,j) = 1 <=> f Tk [D k (j) - Y k (i)} > 0, VI < i,j < b. (B.l) 

Any valid matching between the arrivals and the departures is a perfect matching on the biadjacency matrix 
Afe, where a perfect matching is defined as a set of pairwise non-adjacent edges where all vertices are 
matched. The number of perfect matchings for the biadjacency matrix A is given by its permanent 



penn(A):=£]I^>tf«)> ( B - 2 ) 



7T i=l 



where the sum is over all the permutation vectors tv over {1 , . . . , b} conditioned on busy period size B = b. 
Denote the perfect matching chosen by random matching as 7r RAND . Since each perfect matching is chosen 
with uniform probability and there are perm(A) number of them, the probability of choosing one of them 
is perm(A) -1 . Using this fact, it is easy to now derive the expression for tracking accuracy under random 
matching 

oo 

pRAND = P[tt rand = tt*, = ft], 



6=1 



=EE F[A=a fr b] - (b.3) 

perm(a) 

Appendix C. Introduction to Stochastic Order 

Appendix C.0.1. Stochastic Order 

The stochastic order (also known as the usual stochastic order) is defined as follows 19|, 20j ] . 



Definition 1 (Stochastic Order). A variable Z\ is said to be stochastically dominant with respect to a 

St 

variable Z2, denoted by Z\ > Z2, if 

Z 1 >Z 2 ^ E[#Zi)] >E[#Z 2 )], (C.l) 
for all increasing functions <p for which expectations exist. 
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Naturally, the above definition implies 



Z x > Z 2 ^E[Z 1 ] > E[Z 2 }. 



(C.2) 



We intend to compare tracking accuracies at queues when their arrival processes satisfy a certain stochas- 
tic order and their service processes satisfy the reverse stochastic order. We leverage on the stochastic orders 
to guarantee an order on the tracking accuracies at different queues and hence, optimality of our heuristics. 

Appendix CO. 2. Convex Order 

We define another notion of comparison of random variables known as the convex order [19l . Ch. 3] . 

Definition 2 (Convex Order). A variable Z\ is said to be smaller than Z 2 , denoted by Z\ < Z 2 , if for 



all convex functions <f> : K i— > E[0(Zi)] < E[0(Z2)]. 

The convex order compares the variability of random variables and requires equal mean values, 



In our context, we intend to compare queues under the same load factor but with different variability in 
services. Intuitively, a service distribution with higher variability results in more uncertainty in the order of 
departures implying lower tracking accuracy, and we use the notion of convex order to capture this effect. 

The stochastic and convex orders thus deal with different aspects of comparison of random variables: the 
former deals with the magnitudes while the latter deals with variability, and one does not imply the other. 
There are many sufficient conditions which can be easily checked for the stochastic or convex order to hold 
[l~9j ]. For a set of queues, we can use these conditions to check if the stochastic or the convex orders hold, in 
which case, we can draw conclusions about the optimality of our heuristics for instrumentation allocation. 

Appendix D. Proof of Lemma [T] 

We have for b > 1, 



cx 



Z X <Z 2 ^ E[Zi] = E[Z 2 ], Var[Zi] < Var[Z 2 ]. 



cx 



P[B k =b}=¥[ p| X k (i) < T k (i) <^2x k {j),X k {b) > T k {b)}. 




We have P[B k > 1] = F Tk [X k ] and hence, P[B k > 1] > P[B m > 1]. Now consider, 



Pk(x) 



P[B k >b+l\B k >b, X k {b +l)=x 



b b 



F[T k (b+l)\J{T k (i)-J2Xk(b)}>x], 



i=l j—i 



P[max{T fc (6 + l),T fc (6) - X k (b), ...,}> x 



(D.l) 



We now claim that for b > 1, 



st st 



X k < X m ,T k > T m =>■ p k (x) > p m (x). 



(D.2) 



This is because each term in (|D.1[) satisfies stochastic dominance for i = 1, 1, 



X k < X mi T k > T m T k {i) -Y^X k {b) > T m (i) -J2x k (b). 
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Indeed the above terms are correlated, but they have the same dependency relationship for both queues Q k 
and Q m - Technically, this means that they share the same copula. The copula C for a multivariate variable 
Z is the mapping on the distribution functions such that 

F z (z) = C[F z{1) (z(l)),F z{2) (z(2)) . . .}. (D.3) 

By [H, Thm. 6.B.14], under the same copula, we have the multivariate stochastic order 

[T k (b+l),T k (b)-X k (b),...} > [T m (b+l),T m (b)-X k (b),...}. 

Hence, their maxima also satisfy stochastic order and (|D.2|) is true. Since p k (x) and p m (x) are decreasing 
in x, © holds. □ 

Appendix E. Proof of Lemma [2] 

Let T" :— XT be the normalized service time and let X'(i) be i.i.d. Poisson arrivals with unit rate. 
F[B > 6|X' = x] is given by 

fc-i 

= P[max(T'(l), T'(2) + x(l), . . . , T'(b) + £ x(i)) > X'(b)] 

i=i 

= 1 _E[e" max(T ' (1) ' T ' (2)+;r(1) '-' T ' (fc)+l: '=i 1:,:W) ]. (E.l) 

Now, from convex order, 

h-l 



T' k < T' m => max(T£(l),^(2) + x(l), . . . ,T' k (b) + ^x{i))) 

cx 

2 — 1 

fc-1 

< max(T4(l),T4(2)+x(l) ) ...,T4(6)+^ a; (i))). (E.2) 

cx . ' 

Since (jE.ll) is convex in the argument, it follows the same order of the service distributions. Since the 
convex order is closed under mixtures [19L Thm. 3. A. 12], marginalizing over the arrival times X' preserves 
the order. Hence, 

T' k < T' m ^F[B k >b]<F[B m >b], 

cx 

which in turn is equivalent to a stochastic order. □ 

Appendix F. Proof of Theorem [T] 

Given the busy-period size B = 6, denote the vector of spreads as , where the i th element is given by 

V k {i):=T k {i)-T k {i+l), \<i<b-l. (F.l) 

Note that the elements in the spread vector have identical distributions but are dependent on one 
another, unlike the service times of the infinite-server queue which are independent. We have 

fc-i 



ppipo = P [p| {Tfc ( l ) < x k (i) + r k (i + 1)}]. 
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since i th transaction needs to depart sooner than the (i + l) th transaction. From the definition of spread 
vector in (|F.ip . this is equal to 



6-1 



pr o =nr\{VkW<Xk(i)y\, 



6-1 



P[T fc (l) < T fc (2) <...]+ P[f|{0 < Vk(i) < X k {i)}] 



= ^ + lnf]{\v k ( l )\<x k m, 

2—1 

since Vk is symmetric around zero. We individually have 

\V k {i)\ > \V m (i)\, X k (i) < X m (i), 1 < i < b, 

which implies 

\V k (i)\ - X k {i) > \V m (i)\ - X m (i), \<i<b. 

Since the spreads V k (l), Vfc(2), . . . are correlated, we use [19j, Thm. 6.B.14] to prove the multi-variate 
stochastic order 

|V fc |-X fc > |V m |-X m , (F.2) 
since |Vfc| — X& and |V m | — X m share the same copula, defined in (|D.3j) . From (|F.2[) . 



This implies the order of tracking accuracies in (jlOl) by marginalizing over the busy-period sizes since P fc FIFO 
decreases in busy period b and the busy periods satisfy stochastic order, from Lemma [1] □ 

Appendix G. Proof of Theorem [2] 

In order for (jTTJ) to hold, it suffices to show that 

perm(A fe )|{£? fc = b} > perm(A m )|{B m = 6}, (G.l) 



since the tracking accuracy under random matching is given by (|B.3[) . and taking expectation over B m and 

St 

B k preserves the order since B k > B m from Lemma [T] Since the perm(A) is the number of matchings for 
biadjacency matrix A, more edges in A implies higher perm(A). Let [a k ,P k ] be the support of T k and 
[a m , f3 m ] of T m . From (|T3.1[) for k, the departure of i th arrival has an edge with j tb arrival, for 1 < i < j < b 
iff. 

j-i 

a k < T k {i) - X k {a) < p k . (G.2) 



By definition of support bound, T k (i) < j3 a.s. Hence, the upper bound in (|G.2I) always holds. Since 
X k (i) > 0, we have the probability of edge as 

j-i 

F[A(i,j) = 1] = F Tk [a k +J2x k (a)}. (G.3) 
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Conditioning on the same arrival realizations Xfe,X m = x, from the definition of stochastic dominance, 

i-i j'-i 
^T fc [a* + £ as(o)] > F Tm [a* + J2 x(a)] 

a—i a—i 

> ^r ra [Offl +y^x(a)], 

st 

when afc < a m . Now since the functions are decreasing in x and Xk < X m , the order is preserved on 
removing the conditioning. Hence, (|G.1|) holds implying □ 

Appendix H. Proof of Corollary [1] 

Let T[ :— ATj for i = 1,2 be the normalized service times and let X'(i) be i.i.d. arrivals with unit rate. 

St 

For any positive variable T{ and T' 2 = cT{ with < c < 1, we have T[ > T^. First consider FIFO matching 
rule, 

\Vi\ > \V 2 \ = c\V 1 \, V0<c<l, (H.l) 

and hence, conditions in Theorem [1] for the order of accuracies under FIFO matching is satisfied. 

For random matching rule, let [a, /3] be the support of T[. We have a > ca, and hence, the condition in 
Theorem [5] is in fact, violated. We revisit the probability of having an edge in the biadjacency matrix A 

j-i 

¥ Ti [A{i,j) = l]=F n [a + Y J x{^]- 

k—i 

For the service time T' 2 = cT[ with c < 1, we have 

j'-i 



F T ,[A(i,j) = 1] = F T ,[a+ ~ c Y, xi ^ < ViilMhj) = !]. 

k—i 

and hence, the result holds. □ 

Appendix I. Proof of Theorem [4] 

Let T' := AT be the normalized service time and let X'(i) be i.i.d. Poisson arrivals with unit rate. Given 
the busy-period size B = b, we have 

6-1 

Pr°\{B = b}= F[f){T'(i) - T'(i + 1) < X'(i)}] 

i=l 

b 

= E[ex P (-^(T'(i)-T'(i + l))+)] 
i=i 

b 

= E[exp(-^a i>7r T'(i))|n(T') =7r], 



where aj jlr = 0, ±1 are fixed coefficients conditioned on the event that the service times T' follow a certain 
permutation 7r. Now exp (— 5^*=i a i,irT'(i)) is a concave function of X)i=i a i,TrT'(i) and all permutations 77 
of the service times are equiprobable at both the queues (since all the service times are i.i.d.). 
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On the lines of pjl Thm. 3. A. 19], we can show that when T' fc and T m are conditioned on the same 
permutation 7r, 

b b 

cx . „ CX „ 

l—l 2—1 

Hence, 

TL < TL => P b F1F °(k)\{Bk = b}> Pr°(m)\{B m = b}. 

cx 

Since p FIFO is decreasing in b and the busy-period sizes at k and m follow the stochastic order, the order 
carries through when we marginalize over the busy-period sizes. □ 

Appendix J. Proof of Theorem [5] 

Let Tfe and T m be the sojourn times of the jobs in the two queues. The sojourn times satisfy 

st st st 

X k < X m , J k > J m =► T k (i) > T m {i). 

Now Tfc and T m are correlated, unlike the infinite-server case. However, T^ and T m have the same copula 
since they are both processor sharing queues and by Thm. 6.B.14], 

st _ st st 

Afe < X rn , Jk > J m Tj. > T m . 

On lines of Lemma [TJ 

st st st 

Afc < X m , Tfc > T m ^> Bk > B m . 
Note that the lower bound of support of each sojourn time is the same as the job lengths. On lines 



of|Appcndix G[ ([15]) holds. □ 



Appendix K. Proof of Theorem [6] 

We first provide a result that under the stochastic dominance assumption, the sojourn times of the 
processor-sharing queue dominate those of the infinite-server queue. 

Proposition 2. (Sojourn Times in Infinite Server and Processor Sharing Queues). We have 

st st 

Jps ^ T 1NF Tp5- > T /Jv ^. (K.l) 
Proof: The multivariate ordering is implied by the conditional ordering 

t-i gt 
T PS (i)| f|{T PS (fc) =t k ] > T INP . 
fe=i 



Now the sojourn times T PS (i) at the processor-sharing queue are at least the job lengths with probability 1. 
Hence, upon any conditioning 



T PS {i)\ f|{T PS (fc) =t k } > Jp S (i), * = 1,2,. 



k=l 

Hence, the result in (jK.ip holds. □ 
The above result follows the intuition that when larger jobs are arriving to the processor-sharing queue 
than to the infinite-server queue, the sojourn times in the processor-sharing queue are longer. However, the 
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converse is not always true since even longer jobs can have shorter sojourn times in the infinite-server queue 
due to simultaneous processing of the jobs. 

We now use the above proposition to provide a result on the busy-period sizes. From (jK.ll) . we have the 
multivariate stochastic order. Now, 

st st st 

X PS < X INF , T PS > T INF =>■ B FS > _S INF , 
on lines of Lemma [1] On lines of Theorem [3J we have (fT5| . □ 
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